Yield estimation and control

ABSTRACT

A defect prediction method for a device manufacturing process involving production substrates processed by a lithographic apparatus, the method including training a classification model using a training set including measured or determined values of a process parameter associated with the production substrates processed by the device manufacturing process and an indication regarding existence of defects associated with the production substrates processed in the device manufacturing process under the values of the process parameter, and producing an output from the classification model that indicates a prediction of a defect for a substrate.

This application is a continuation of U.S. patent application Ser. No.16/851,477, filed on Apr. 17, 2020, now allowed, which is a continuationof U.S. patent application Ser. No. 15/104,517, filed on Jun. 14, 2016,now U.S. Pat. No. 10,627,723, which is the U.S. national phase entry ofPCT patent application no. PCT/EP2014/074664, filed on Nov. 14, 2014,which claims the benefit of priority of U.S. provisional patentapplication No. 61/917,305, filed on Dec. 17, 2013, each of theforegoing applications is incorporated herein in its entirety byreference.

TECHNICAL FIELD

The description herein relates to lithographic apparatuses andprocesses, and more particularly to a tool to predict and correctdefects so as to increase the yield.

BACKGROUND

A lithographic apparatus can be used, for example, in the manufacture ofintegrated circuits (ICs) or other devices. In such a case, a patterningdevice (e.g., a mask) may contain or provide a circuit patterncorresponding to an individual layer of the device (“design layout”),and this circuit pattern can be transferred onto a target portion (e.g.comprising one or more dies) on a substrate (e.g., silicon wafer) thathas been coated with a layer of radiation-sensitive material (“resist”),by methods such as irradiating the target portion through the circuitpattern on the patterning device. In general, a single substratecontains a plurality of adjacent target portions to which the circuitpattern is transferred successively by the lithographic apparatus, onetarget portion at a time. In one type of lithographic apparatus, thecircuit pattern on the entire patterning device is transferred onto onetarget portion in one go; such an apparatus is commonly referred to as awafer stepper. In an alternative apparatus, commonly referred to as astep-and-scan apparatus, a projection beam scans over the patterningdevice in a given reference direction (the “scanning” direction) whilesynchronously moving the substrate parallel or anti-parallel to thisreference direction. Different portions of the circuit pattern on thepatterning device are transferred to one target portion progressively.

Prior to the device fabrication procedure of transferring the circuitpattern from the patterning device to the substrate of the devicemanufacturing process, the substrate may undergo various devicefabrication procedures of the device manufacturing process, such aspriming, resist coating and a soft bake. After exposure, the substratemay be subjected to other device fabrication procedures of the devicemanufacturing process, such as a post-exposure bake (PEB), development,and a hard bake. This array of device fabrication procedures is used asa basis to make an individual layer of a device, e.g., an IC. Thesubstrate may then undergo various device fabrication procedures of thedevice manufacturing process such as etching, ion-implantation (doping),metallization, oxidation, chemo-mechanical polishing, etc., all intendedto finish off the individual layer of the device. If several layers arerequired in the device, then the whole process, or a variant thereof, isrepeated for each layer. Eventually, a device will be present in eachtarget portion on the substrate. If there is a plurality of devices,these devices are then separated from one another by a technique such asdicing or sawing, whence the individual devices can be mounted on acarrier, connected to pins, etc.

SUMMARY

Disclosed herein is a computer-implemented defect prediction method fora device manufacturing process involving production substrates processedby a lithographic apparatus, the method comprising:

training a classification model using a training set comprising measuredor determined values of a process parameter associated with theproduction substrates processed by the device manufacturing process andan indication regarding existence of defects associated with theproduction substrates processed in the device manufacturing processunder the values of the process parameter; and

producing an output from the classification model that indicates aprediction of a defect for a substrate.

Disclosed herein is a method of training a classification model, themethod comprising:

predicting a defect in or on a substrate using the classification model,the classification model having, as an independent variable, a processparameter of a device manufacturing process for lithographically exposedsubstrates and/or a layout parameter of a pattern to be provided on asubstrate using a lithographic apparatus;

receiving information regarding existence of a defect for a measured ordetermined value of the process parameter and/or layout parameter; and

training the classification model based on the predicted defect and theinformation regarding existence of the defect for the measured ordetermined value of the process parameter and/or layout parameter.

Disclosed herein is a computer-implemented method of producing aclassification model to facilitate defect prediction in a devicemanufacturing process involving production substrates processed by alithographic apparatus, the method comprising training theclassification model using a training set comprising measured ordetermined values of a process parameter of a plurality of substratesprocessed by the device manufacturing process and an indicationregarding existence of defects associated with the values of the processparameter.

Disclosed herein is a computer program product comprising a computerreadable medium having instructions recorded thereon, the instructionswhen executed by a computer implementing a method as described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of various subsystems of a lithography system.

FIG. 2 schematically depicts a method of predicting defects in a devicemanufacturing process.

FIG. 3 is a block diagram of simulation models.

FIG. 4 schematically shows prediction of a process window of a layout.

FIG. 5 schematically depicts a method of predicting defects in a devicemanufacturing process, according to an embodiment.

FIG. 6 schematically depicts a method of retraining a classificationmodel.

FIG. 7 shows an exemplary classification model as trained by a trainingset.

FIG. 8 is a block diagram of an example computer system.

FIG. 9 is a block diagram of a model predictive control system.

FIG. 10 is a schematic diagram of a lithographic projection apparatus.

FIG. 11 is a schematic diagram of another lithographic projectionapparatus.

FIG. 12 is a more detailed view of the apparatus in FIG. 11.

FIG. 13 schematically depicts an embodiment of a lithographic cell orcluster.

DETAILED DESCRIPTION

Although specific reference may be made in this text to the manufactureof ICs, it should be explicitly understood that the description hereinhas many other possible applications. For example, it may be employed inthe manufacture of integrated optical systems, guidance and detectionpatterns for magnetic domain memories, liquid-crystal display panels,thin-film magnetic heads, etc. The skilled artisan will appreciate that,in the context of such alternative applications, any use of the terms“reticle”, “wafer” or “die” in this text should be considered asinterchangeable with the more general terms “mask”, “substrate” and“target portion”, respectively.

In the present document, the terms “radiation” and “beam” are used toencompass all types of electromagnetic radiation, including ultravioletradiation (e.g. with a wavelength of 365, 248, 193, 157 or 126 nm) andEUV (extreme ultra-violet radiation, e.g. having a wavelength in therange 5-20 nm).

The term “optimizing” and “optimization” as used herein mean adjustingan apparatus, e.g., a lithographic projection apparatus, such thatdevice fabrication results and/or processes (e.g., of lithography) haveone or more desirable characteristics, such as higher accuracy ofprojection of a design layout on a substrate, larger process window,etc.

As a brief introduction, FIG. 1 illustrates an exemplary lithographicprojection apparatus 10A. Major components include illumination opticswhich define the partial coherence (denoted as sigma) and which mayinclude optics 14A, 16Aa and 16Ab that shape radiation from a radiationsource 12A, which may be a deep-ultraviolet excimer laser source orother type of source including an extreme ultra violet (EUV) source (asdiscussed herein, the lithographic projection apparatus itself need nothave the radiation source); and optics 16Ac that project an image of apatterning device pattern of a patterning device 18A onto a substrateplane 22A. An adjustable filter or aperture 20A at the pupil plane ofthe projection optics may restrict the range of beam angles that impingeon the substrate plane 22A, where the largest possible angle defines thenumerical aperture of the projection optics NA=sin(θ_(max)).

In a lithographic projection apparatus, projection optics direct andshape the illumination from a source via a patterning device and onto asubstrate. The term “projection optics” is broadly defined here toinclude any optical component that may alter the wavefront of theradiation beam. For example, projection optics may include at least someof the components 14A, 16Aa, 16Ab and 16Ac. An aerial image (AI) is theradiation intensity distribution at substrate level. A resist layer onthe substrate is exposed and the aerial image is transferred to theresist layer as a latent “resist image” (RI) therein. The resist image(RI) can be defined as a spatial distribution of solubility of theresist in the resist layer. A resist model can be used to calculate theresist image from the aerial image, an example of which can be found inU.S. Patent Application Publication No. US 2009-0157630, the disclosureof which is hereby incorporated by reference in its entirety. The resistmodel is related only to properties of the resist layer (e.g., effectsof chemical processes which occur during exposure, post-exposure bake(PEB) and development). Optical properties of the lithographicprojection apparatus (e.g., properties of the source, the patterningdevice and the projection optics) dictate the aerial image and can bedefined in an optical model. Since the patterning device used in thelithographic projection apparatus can be changed, it is desirable toseparate the optical properties of the patterning device from theoptical properties of the rest of the lithographic projection apparatusincluding at least the source and the projection optics.

As shown in FIG. 13, the lithographic apparatus LA may form part of alithographic cell LC, also sometimes referred to as a lithocell orlithocluster, which also includes apparatus to perform one or more pre-and post-exposure processes on a substrate. Conventionally these includeone or more spin coaters SC to deposit a resist layer, one or moredevelopers DE to develop exposed resist, one or more chill plates CH andone or more bake plates BK. A substrate handler, or robot, RO picks up asubstrate from input/output ports I/O1, I/O2, moves it between thedifferent process devices and delivers it to the loading bay LB of thelithographic apparatus. These devices, which are often collectivelyreferred to as the track, are under the control of a track control unitTCU which is itself controlled by the supervisory control system SCS,which also controls the lithographic apparatus via lithographic controlunit LACU. Thus, the different apparatus may be operated to maximizethroughput and processing efficiency. The lithographic cell LC mayfurther comprises one or more etchers to etch the substrate and one ormore measuring devices configured to measure a parameter of thesubstrate. The measuring device may comprise an optical measurementdevice configured to measure a physical parameter of the substrate, suchas a scatterometer, a scanning electron microscope, etc. The measuringdevice may be incorporated in the lithographic apparatus LA. Anembodiment of the invention may be implemented in or with thesupervisory control system SCS and/or the lithographic control unitLACU. For example, data from the supervisory control system SCS and/orthe lithographic control unit LACU may be used by an embodiment of theinvention and one or more signals from an embodiment of the inventionmay be provided to the supervisory control system SCS and/or thelithographic control unit LACU.

FIG. 2 schematically depicts a method of predicting defects in a devicemanufacturing process. A defect can be a systematic defect such asnecking, line pull back, line thinning, CD, overlapping and bridging; adefect can also be a random defect such as one caused by deposition of aparticle such as a dust particle. A systematic defect can be predictedand controlled. A defect can be in a resist image, an optical image oran etch image (i.e., a pattern transferred to a layer of the substrateby etching using the resist thereon as a mask). A computational or anempirical model 213 can be used to predict (e.g., predict the existence,locations, types, shapes, etc. of) defects 214. The model 213 can takeinto account parameters 211 (also referred to as process parameters) ofthe device manufacturing process and/or layout parameters 212. Theprocess parameters 211 are parameters associated with the devicemanufacturing process but not with the layout. For example, the processparameters 211 may include a characteristic of the source (e.g.,intensity, pupil profile, etc.), a characteristic of the projectionoptics, dose, focus, a characteristic of the resist, a characteristic ofdevelopment of the resist, a characteristic of post-exposure baking ofthe resist, and/or a characteristic of etching. The layout parameters212 may include shapes, sizes, relative locations, and absolutelocations of various features on a layout, and also overlapping offeatures on different layouts. The model 213 may be a fixed model, i.e.,the model itself does not change with its input such as the processparameters 211 and the layout parameters 212. Namely, an outcome of afixed model is always the same under the same input. In an empiricalmodel, the image (e.g., resist image, optical image, etch image) is notsimulated; instead, the empirical model predicts defects based oncorrelations between the input and the defects. In a computationalmodel, a portion or a characteristic of the image is calculated, anddefects are identified based on the portion or the characteristic. Forexample, a line pull back defect may be identified by finding a line endtoo far away from its desired location; a bridging defect may beidentified by finding a location where two lines undesirably join.

FIG. 3 illustrates an exemplary computational model. A source model 31represents optical characteristics (including radiation intensitydistribution and/or phase distribution) of the source. A projectionoptics model 32 represents optical characteristics (including changes tothe radiation intensity distribution and/or the phase distributioncaused by the projection optics) of the projection optics. A designlayout model 35 represents optical characteristics (including changes tothe radiation intensity distribution and/or the phase distributioncaused by a given design layout) of a design layout, which is therepresentation of an arrangement of features on or formed by apatterning device. An aerial image 36 can be simulated from the sourcemodel 31, the projection optics model 32 and the design layout model 35.A resist and/or etch image 38 can be simulated from the aerial image 36using a resist and/or etch model 37. Simulation of lithography can, forexample, predict contours and/or CDs in an image.

More specifically, it is noted that the source model 31 can representthe optical characteristics of the source that include, but not limitedto, sigma (a) settings as well as any particular illumination sourceshape (e.g. off-axis radiation sources such as annular, quadrupole, anddipole, etc.). The projection optics model 32 can represent the opticalcharacteristics of the of the projection optics that include aberration,distortion, refractive indexes, physical sizes, physical dimensions,etc. The design layout model 35 can represent physical properties of aphysical patterning device, as described, for example, in U.S. Pat. No.7,587,704, which is incorporated by reference in its entirety. Theobjective of the simulation is to accurately predict, for example, edgeplacements, aerial image intensity slopes and CDs, which can then becompared against an intended design. The intended design is generallydefined as a pre-OPC design layout which can be provided in astandardized digital file format such as GDSII or OASIS or other fileformat.

FIG. 4 schematically shows prediction of a process window of a layout(i.e., a space of process parameters under which the layout issubstantially free of systematic defects). Sub-process windows 421-423(depicted as the unhatched areas) may be predicted using a model(empirical or computational) for features in a layout with respect todifferent types of defects of the layout 411-413 (e.g., line pull back,CD, necking, etc.). For example, a process in the sub-process window 421does not produce line pull back defects among these features. Thesub-process windows of all features, and for all types of defects, maybe merged to form the process window 430 of the layout.

FIG. 5 schematically depicts a method of predicting defects in a devicemanufacturing process, according to an embodiment. A classificationmodel (also known as a classifier) 513 can be used to predict (e.g.,predict the existence, locations, types, shapes, etc. of) defects 514.The model 513 can take account of process parameters 511 and/or layoutparameters 512. The process parameters 511 are parameters associatedwith the device manufacturing process but not with the layout. Forexample, the process parameters 511 may include a characteristic of thesource (e.g., intensity, pupil profile, etc.), a characteristic of theprojection optics, dose, focus, a characteristic of the resist, acharacteristic of development of the resist, a characteristic ofpost-exposure baking of the resist and/or a characteristic of etching.The layout parameters 512 may include shapes, sizes, relative locations,and absolute locations of various features on a layout, and alsooverlapping of features on different layouts.

The term “classifier” or “classification model” sometimes refers to amathematical function, implemented by a classification algorithm, thatmaps input data to a category. In machine learning and statistics,classification is the problem of identifying to which of a set ofcategories (sub-populations) a new observation belongs, on the basis ofa training set of data containing observations (or instances) whosecategory membership is known. The individual observations are analyzedinto a set of quantifiable properties, known as various explanatoryvariables, features, etc. These properties may variously be categorical(e.g. “good”—a process that does not produce defects or “bad”—a processthat produces defects). Classification is considered an instance ofsupervised learning, i.e. learning where a training set of correctlyidentified observations is available.

Terminology across fields is quite varied. In statistics, whereclassification may be done with logistic regression or a similarprocedure, the properties of observations are termed explanatoryvariables (or independent variables, regressors, etc.), and thecategories to be predicted are known as outcomes, which are consideredto be possible values of the dependent variable. In machine learning,the observations are often known as instances, the explanatory variablesare termed features (grouped into a feature vector), and the possiblecategories to be predicted are classes.

A classification model may be phrased in terms of a linear function thatassigns a score to each possible category k by combining the featurevector of an instance with a vector of weights, using a dot product. Thepredicted category is the one with the highest score. This type of scorefunction is known as a linear predictor function and has the followinggeneral form: score(X_(i),k)=β_(k)·X_(i), where X_(i) is the featurevector for instance i, β_(k) is the vector of weights corresponding tocategory k, and score(X_(i), k) is the score associated with assigninginstance i to category k. Models with this basic setup are known aslinear classifiers. Examples of such algorithms are logistic regression,multinomial logit, probit regression, the perceptron algorithm, supportvector machines, import vector machines and/or linear discriminantanalysis.

In an embodiment, the classification model 513 involves logisticregression. In the context of this embodiment, the dependent variable isbinary—that is, the number of available categories is two—e.g., “good”or “bad.” However, the number of available categories is certainly notlimited to two.

Logistic regression measures the relationship between a categoricaldependent variable and one or more independent variables, which areusually (but not necessarily) continuous, by using probability as thepredicted values of the dependent variable. The classification model 513may be trained using a training set of data containing one or moreprocess and/or layout parameters and whether the process and/or layoutparameters produce defects (i.e., “bad”) or not (i.e., “good”). Aninitial training set may be obtained from one or more test runs of alayout under a range of values of the parameters.

In an embodiment, the classification model 513 involves kernel logisticregression, especially when the score function cannot be expressed inthe linear form of score(X₁, k)=β_(k)·X_(i), where X_(i) is the featurevector for instance i, β_(k) is the vector of weights corresponding tocategory k, and score(X_(i), k) is the score associated with assigninginstance i to category k. A kernel may first be used to project theindependent variables (e.g., process parameters) into another parameterspace: Φ: X→Y, so that score(X_(i),k)=β_(k)·Y_(i), where Y_(i)=Φ(X_(i)).

The method illustrated in FIG. 5 may further include a correction step515, in which one or more process parameters 511, one or more layoutparameters 512, or both, may be adjusted to reduce or eliminate thedefects.

In an embodiment, the model 513 is not a fixed model. Instead, the model513 may be refined with data 516 from metrology, yield data (e.g.,identification of defects by a measurement tool such as an electronicmicroscope, by electrical testing, etc.) or other data, from a user ofthe lithographic apparatus, or from another model (e.g., anotherempirical model or computational model). The model 513 may be refinedafter exposure of one or more dies and/or one or more substrates usingfurther data.

The data 516 may include measured or determined values of a processparameter associated with a plurality of production substrates processedby the device manufacturing process. Production substrates aresubstrates having one or more devices in one or more stages ofproduction. For example, the production substrates may be substrateshaving a resist image for one or more devices. Values of a processparameter for such substrates may include data from the lithographicapparatus (e.g., apparatus settings and/or lithographic apparatus sensordata) and/or metrology data (e.g., provided by a dedicated opticalmeasuring device to measure physical parameters of the resist image). Asanother example, the production substrates may be substrates havingetched features and/or features with functioning devices. Values of aprocess parameter for such substrates may include data from an etch tool(e.g., etch tool settings and/or etch tool sensor data), metrology data(e.g., provided by a scanning electron microscope) and/or yield data(e.g., defect analysis from a measuring tool comparing a produced deviceagainst the expected device, electrical testing of devices, etc.).Further, the device manufacturing process may involve the entire processfrom substrate to device or a portion thereof. For example, the devicemanufacturing process may be the lithographic patterning process only orin combination with another device manufacturing procedures. In anembodiment, the device manufacturing process may be the etch processonly or in combination with another device manufacturing procedure. Inthe etch circumstance, the device manufacturing process involves alithographic apparatus because the substrates processed by the etchapparatus were patterned by a lithographic patterning procedureinvolving a lithographic apparatus.

In an embodiment, there is provided an indication regarding existence ofdefects associated with the production substrates processed in thedevice manufacturing process under the values of the process parameter.Thus, in an embodiment, each of the measured or determined values of aprocess parameter is associated with an indication regarding existenceof a defect. For example, the indication regarding existence of defectsmay be any label to signify the existence or absence of a defect. Forexample, the label may be “good” and/or “bad”. The label may be appliedby a user or determined automatically using an applicable tool. Forexample, electronic testing of a substrate may identify a defect in adevice and label the device “good” or “bad”. That tested substrate isassociated with a value of a process parameter. In an example, if theyield is below a certain threshold, the associated values of the processparameter 511 and/or layout parameter 512 may be labeled/categorized as“bad.” The combination of the value of the process parameter (e.g., doseand focus) and the label is used to train the model 513.

Data from metrology may be obtained from an optical measuring tool(e.g., a tool to measure diffracted radiation from a metrology targetand/or from the exposed area), an electron microscope, or other suitableinspection tool and may be data measured by a sensor in a lithographicapparatus, such as level sensor or alignment data.

In an embodiment, refinement of the model 513 may include training witha training set including a new observation of one or more processparameters or both process and layout parameters used in the exposure ofthe one or more dies and/or one or more substrates and the data frommetrology, yield data or other data, from a user of the lithographicapparatus, or from another model. The training set used for refining themodel 513 may not necessarily include all the data previously used totrain the model 513. For example, if the model 513 is initially trainedwith a data set including 100 observations, this training set mayinclude 99 of the 100 observations and the new observation. Thisapproach may limit the size of the training set so as to limit thecomputational cost of the training. One or more algorithms may be usedto manage, or continuously manage, the size of the data set. Forexample, import vector machine or support vector machine may be used tomanage the size of the data set.

FIG. 6 schematically depicts a method of training a classification modelsuch as the model 513. In step 611, a defect of a device, e.g., in aresist or optical image of the device, is predicted using theclassification model with one or more process and/or layout parametersused to form the device as independent variables of the classificationmodel. In step 612, the device, e.g., the resist or optical image issimulated, using another model (e.g., empirical or computational) ormeasured, e.g., the resist image or the etched pattern, using a suitableinspection tool and the existence, shapes, types, and/or locations ofdefects is determined. In step 613, the model is trained based on theprediction and the existence of defects as determined from thesimulation or measurement.

FIG. 7 shows an exemplary output of a classification model as trained bya training set including 187 observations containing pairs of focus anddose as the process parameters and whether the pairs of focus and doseproduce defects (“◯” means no defect, e.g., “good”; “X” means defect,e.g., “bad”). The probability of a defect is determined by the modelafter training and shown in the output of FIG. 7 as an isopleth map withprobabilities associated with the contour lines. As will be appreciated,the output of the model can be in other forms such as a color or grayscale or a table of results. The output of the model matches reasonablywell with the training set.

Thus, in an embodiment, there is provided in-line learning of a devicemanufacturing process involving a lithographic apparatus. That is, aclassification model is produced that is continually or regularlytrained with new measured or determined values of one or more processparameters (e.g., dose and/or focus) associated with productionsubstrates and an indication of a defect associated with such one ormore process parameters. Thus, in an embodiment, a model specific to alithographic apparatus and/or a device manufacturing process is producedthat evolves over time with the use of the lithographic apparatus and/orprocessing of substrates using the device manufacturing process. Thus,in an embodiment, empirical data is used to generate, and subsequentlyshape, a model that represents a device manufacturing process (e.g., aparticular device manufacturing process using a particular patterningdevice layout).

Due to the empirical data, little data interpretation may be required.For example, such a process may directly use a pupil intensity map of ametrology tool in training the classification model, rather thaninterpreting such an intensity map to extract structures and edge slopesand then using the data to improve a model. For example, learningtechniques can correlate the possibility of an existence of a defectfrom the pupil intensity map based on the already learned model and sothe data from the pupil intensity map can help affirm the existence (ornot) of a defect. Thus, in an embodiment, such data need not have anassociated label per se but nevertheless may be used to increase theability of the model to predict whether a defect will occur or not. Forexample, the pupil intensity map may indicate a feature having aparameter value out of line with adjacent values of the parameter. Whilenot necessarily confirming of a defect, such information may be used totrain the model to help affirm or deny an existing correlation in themodel regarding that feature and defect probability. In an embodiment,measurements are of device layouts using, e.g., a measurement pupil ofan optical measuring tool. So, special structures (e.g., metrologytargets) may not be necessary; any structure may do, such as the devicelayout structures (for example, SRAM cell-blocks in logic/MPU devices).

In an embodiment, operator knowledge/experience may be used an input tothe formation of the classifier model. Operator feedback may steer thepredictability of the model. For example, the user may add morepredictive features into the classification algorithm.

In an embodiment, the classifier model can generate a probability of adefect given specific measured or determined features. And, the accuracyof the prediction increases with time and with more measurements as theclassifier model becomes more “experienced”. This in-line learning isdistinguished from so-called data mining, which is typically used toreview why things went wrong. In an embodiment, the in-line data is usedto generate, and subsequently update, a model that can predict theprobability of a defect occurring as the process is running. Thus, theoutput of the model may provide an indicator of what to check and whichsubstrate dies to measure to see if all goes well—and thus to improvethe overall yield.

In an embodiment, the device manufacturing process may be controlledusing the classifier model. The in-line learning allows for tracking ofthe process (e.g., drift) and allows for tuning (controlling) of theprocess. For example, one or parameters of the lithographic apparatusmay be controlled based on the output of the classifier model, whetherautomatically or after user evaluation. For example, the focus and/ordose of a lithographic projection apparatus may be controlled based onthe output.

In an embodiment, the classifier model incorporates measurements acrossa substrate (e.g., not merely intrafield data). Thus, the measurementsmay data through focus, as the local substrate differences may berelatively large. Thus, through relying on measurements from a pluralityof different substrate, it is possible to measure at a specific dose,completely through focus without the need for generating separateexposures.

In an embodiment, while the discussion has focused on lithographicparameters such focus and dose, the learning paradigm may be readilyextended to other processes, such as etch. For instance, therelationship between etch features and yield may be learned, which mayalso be observed with in-line metrology data.

Thus, in an embodiment, there is provided a learning classifier modelthat can predict a defect and estimate its probability. Moreover, in anembodiment, the learning classifier is not static and is continuouslyupdated and improved by measured or determined data during the devicemanufacturing process. Moreover, the classifier model can be extended inits coverage by feeding it data which is not related to lithography suchas layer-thickness variations, post-etch data, operator defectdecisions, etc., which further enhance the “experience” of the model.

In an embodiment, the classifier model enables improvement of theestimation of a probability of a defect for non-measured data points.For example, one may be interested in a prediction of a defect at site Aand B on a substrate or a pattern layout. The classifier model enablesprediction of the occurrence of a defect at site A and B. Then, ameasurement that adds information (e.g., metrology data (without or withlabel data), yield data, etc.) associated with site A may be used tofurther train the classifier model. Now, an estimate of the probabilityof a defect at site A and B can be determined, without having to measureat site B.

In an embodiment, the classifier model vectors may include various typesof information. So, for example, the classifier model may include dataregarding the probability of a defect for a particular dose and focuscombination but also include data regarding what one or more apparatusesare associated with that data point, what pattern layout is associatedwith that data point, what etch type was used, etc. So, in anembodiment, a classifier model may be trained on limited and specificdata (e.g., merely dose and focus information and an associated label)or trained on comprehensive data or some variant in between. So, from amore comprehensive model, a “submodel” may be defined from the modelthat is focused on a particular apparatus (e.g., lithographicapparatus), on a particular layout, etc. Thus, a user may employ aparticular model or “submodel”, whether, e.g., for analysis or forprocess control, as desired that focuses on the user's needs or desires.

In an embodiment, the training of the classifier model decides if newtraining data (e.g., measured data points) add enough information to beincluded in the model. Adding this new information is balanced withincreasing the size of the model, so that the model won't grow withoutbound.

In an embodiment, the classifier model can provide on an on-productprintability prediction. For example, the classifier model can quantifythe probability of a defect. The classifier model may provide fullsubstrate predictions. The classifier model may predict a number ofdefects. The classifier model may predict yield of good die. Theclassifier model may provide a location of the defect.

In an embodiment, the training data for the classifier model may besampled for each substrate of a lot produced by a device manufacturingprocess. The training device data may sampled for each device on asubstrate of, e.g., a lot of substrates. The training data may besampled for each layer on a substrate. In an embodiment, the trainingdata comprises measurement data and measurement locations may be in-die,from outside of the die (e.g., scribe line metrology targets) and/oracross the substrate. In-die measurements may be sampled based on asimulation result of hotspots, location of particular device structures(e.g., SRAM and other locations), and/or be device (e.g., IC) dependent.

In an embodiment, new training data is continually supplied (e.g.,during device manufacturing, during the process of a plurality of lotsof substrates, etc.) and thus prediction quality is continually updated.

In an embodiment, output of the classifier model may be provided to ayield management system of a fab to improve device yield.

So, in an embodiment, the values of one or more process parametersand/or layout parameters are statistically correlated by machinelearning to full-substrate, on-product yield sensitivities. Thus, forexample, defects may be predicted for particular parts of a die and/orparticular parts of a substrate. Further, the defects may be predictedfor actual production based on actual production data. Thus, the machinelearning model may be specific to actual process conditions (including,for example, their drift) and enable enhanced prediction compared toprediction based merely on a theoretical model using pattern layoutdata.

Thus, in an embodiment, there is provided a comprehensive defectinspection and yield prediction system in which in-line and on-productparameter values from, e.g., a metrology tool measuring productionsubstrates, are used to infer printability defects. The system utilizesartificial intelligence techniques to predict printability ofsystematic, layout-specific defects. It extends hotspot prediction fromsingle-die to full-substrate, including the edge-of-substrate region. Inan embodiment, the system may measure each substrate in a lot using ametrology tool. The system may augment or replace currentdefect-inspection methods (e.g., scanning electron microscopes). Theoutput of the system may include a performance indicator within thefab's yield management system to predict and/or improve final deviceyield. The system may generate customized lithograph apparatus or otherapparatus recipes and files to (automatically) improve yield insubsequent lots (or substrates) of, e.g., the same device and/or layer.The system may enable continual estimation and tracking of defects andcontinually improve model prediction accuracy. Occurrence of defects maybe reduced or minimized via regulation (e.g., closed-loop control).

An advantage of the described embodiments herein may include fasteryield ramping, more efficient SEM review, historical analysis and/orcontrol.

FIG. 8 is block diagram of a model predictive control system accordingto an embodiment. As shown one or more inputs 800 are provided to adevice manufacturing process 810 involving production substratespatterned using a lithographic apparatus. The inputs 800 may include oneor more layout parameters and/or one or process parameters as describedabove. The device manufacturing process 810 involves at least one deviceproduction step, such as lithographic patterning, development, etch,etc. or any combination selected therefrom.

Subsequent to or during the process 810, one or more outputs 820 may beproduced. The outputs 820 may include values of one or more processand/or layout parameters for production substrates produced using thedevice manufacturing process. For example, the values may be data of theproduction substrates measured by a metrology tool, may be data from alithographic apparatus or etch tool after processing of the productionsubstrates, etc. In an embodiment, at least some of the data may belabeled as described herein. Such outputs 820 are provided to stateestimator 830 to train a model as described herein. In an embodiment,the model is used to predict defects although a model may be trained topredict other aspects. As shown, the state estimator 830 may receive oneor more inputs 820 to the process 810. For example, the one or moreinputs 820 may be layout data or data produced from layout data. Forexample, data produced from the layout data may be a simulation of apattern layout to identify hotspots (e.g., areas of a pattern layoutprone to not pattern correctly). Such simulated data may be producedusing simulation software in the art such as ASML's Tachyon LMC product.

The model of the state estimator 830 may then be used to provide anoutput to regulator 840. Regulator 840 may provide one or more inputs800 to the process 810 and/or modify one or more inputs 800 to besupplied to process 810. For example, the regulator 840 may generate oneor more settings for a lithographic apparatus, etch tool, etc. to helpmitigate defects in the future production of substrates. In anembodiment, the regulator 840 may receive one or more targets 850 thatidentify to what or by what standard the regulator 840 should introduceor modify one or more inputs 800 to the process 810.

FIG. 9 is a block diagram that illustrates a computer system 100 whichcan assist in implementing optimization methods and flows disclosedherein. Computer system 100 includes a bus 102 or other communicationmechanism to communicate information, and a processor 104 (or multipleprocessors 104 and 105) coupled with bus 102 to process information.Computer system 100 may also include a main memory 106, such as a randomaccess memory (RAM) or other dynamic storage device, coupled to bus 102to store and/or supply information and instructions to be executed byprocessor 104. Main memory 106 may be used to store and/or supplytemporary variables or other intermediate information during executionof instructions to be executed by processor 104. Computer system 100 mayfurther include a read only memory (ROM) 108 or other static storagedevice coupled to bus 102 to store and/or supply static information andinstructions for processor 104. A storage device 110, such as a magneticdisk or optical disk, may be provided and coupled to bus 102 to storeand/or supply information and instructions.

Computer system 100 may be coupled via bus 102 to a display 112, such asa cathode ray tube (CRT) or flat panel or touch panel display, todisplay information to a computer user. An input device 114, includingalphanumeric and other keys, may be coupled to bus 102 to communicateinformation and command selections to processor 104. Another type ofuser input device may be cursor control 116, such as a mouse, atrackball, or cursor direction keys, to communicate directioninformation and command selections to processor 104 and to controlcursor movement on display 112. This input device typically has twodegrees of freedom in two axes, a first axis (e.g., x) and a second axis(e.g., y), that allows the device to specify positions in a plane. Atouch panel (screen) display may also be used as an input device.

According to one embodiment, portions of the optimization process may beperformed by computer system 100 in response to processor 104 executingone or more sequences of one or more instructions contained in mainmemory 106. Such instructions may be read into main memory 106 fromanother computer-readable medium, such as storage device 110. Executionof the sequences of instructions contained in main memory 106 causesprocessor 104 to perform the process steps described herein. One or moreprocessors in a multi-processing arrangement may be employed to executethe sequences of instructions contained in main memory 106. In analternative embodiment, hard-wired circuitry may be used in place of orin combination with software instructions. Thus, the description hereinis not limited to any specific combination of hardware circuitry andsoftware.

The term “computer-readable medium” as used herein refers to any mediumthat participates in providing instructions to processor 104 forexecution. Such a medium may take many forms, including but not limitedto, non-volatile media, volatile media, and transmission media.Non-volatile media include, for example, optical or magnetic disks, suchas storage device 110. Volatile media include dynamic memory, such asmain memory 106. Transmission media include coaxial cables, copper wireand fiber optics, including the wires that comprise bus 102.Transmission media can also take the form of acoustic or light waves,such as those generated during radio frequency (RF) and infrared (IR)data communications. Common forms of computer-readable media include,for example, a floppy disk, a flexible disk, hard disk, magnetic tape,any other magnetic medium, a CD-ROM, DVD, any other optical medium,punch cards, paper tape, any other physical medium with patterns ofholes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip orcartridge, a carrier wave as described hereinafter, or any other mediumfrom which a computer can read.

Various forms of computer readable media may be involved in carrying oneor more sequences of one or more instructions to processor 104 forexecution. For example, the instructions may initially be borne on adisk or memory of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over acommunications path. Computer system 100 can receive the data from thepath and place the data on bus 102. Bus 102 carries the data to mainmemory 106, from which processor 104 retrieves and executes theinstructions. The instructions received by main memory 106 mayoptionally be stored on storage device 110 either before or afterexecution by processor 104.

Computer system 100 may include a communication interface 118 coupled tobus 102. Communication interface 118 provides a two-way datacommunication coupling to a network link 120 that is connected to anetwork 122. For example, communication interface 118 may provide awired or wireless data communication connection. In any suchimplementation, communication interface 118 sends and receiveselectrical, electromagnetic or optical signals that carry digital datastreams representing various types of information.

Network link 120 typically provides data communication through one ormore networks to other data devices. For example, network link 120 mayprovide a connection through network 122 to a host computer 124 or todata equipment operated by an Internet Service Provider (ISP) 126. ISP126 in turn provides data communication services through the worldwidepacket data communication network, now commonly referred to as the“Internet” 128. Network 122 and Internet 128 both use electrical,electromagnetic or optical signals that carry digital data streams. Thesignals through the various networks and the signals on network link 120and through communication interface 118, which carry the digital data toand from computer system 100, are exemplary forms of carrier wavestransporting the information.

Computer system 100 can send messages and receive data, includingprogram code, through the network(s), network link 120, andcommunication interface 118. In the Internet example, a server 130 mighttransmit a requested code for an application program through Internet128, ISP 126, network 122 and communication interface 118. One suchdownloaded application may provide for the code to implement a methodherein, for example. The received code may be executed by processor 104as it is received, and/or stored in storage device 110, or othernon-volatile storage for later execution. In this manner, computersystem 100 may obtain application code in the form of a carrier wave.

FIG. 10 schematically depicts an exemplary lithographic projectionapparatus. The apparatus comprises:

-   -   an illumination system IL, to condition a beam B of radiation.        In this particular case, the illumination system also comprises        a radiation source SO;    -   a first object table (e.g., mask table) MT provided with a        patterning device holder to hold a patterning device MA (e.g., a        reticle), and connected to a first positioner PM to accurately        position the patterning device with respect to item PS;    -   a second object table (substrate table) WT provided with a        substrate holder to hold a substrate W (e.g., a resist-coated        silicon wafer), and connected to a second positioner PW to        accurately position the substrate with respect to item PS;    -   a projection system PS (e.g., a refractive, catoptric or        catadioptric optical system) to image an irradiated portion of        the patterning device MA onto a target portion C (e.g.,        comprising one or more dies) of the substrate W.

As depicted herein, the apparatus is of a transmissive type (i.e., has atransmissive mask). However, in general, it may also be of a reflectivetype, for example (with a reflective mask). Alternatively, the apparatusmay employ another kind of patterning device as an alternative to theuse of a classic mask; examples include a programmable mirror array orLCD matrix.

The source SO (e.g., a mercury lamp or excimer laser) produces a beam ofradiation. This beam is fed into an illumination system (illuminator)IL, either directly or after having traversed a conditioner, such as abeam expander. The illuminator IL may comprise an adjuster AD configuredto set the outer and/or inner radial extent (commonly referred to asσ-outer and σ-inner, respectively) of the intensity distribution in thebeam. In addition, it will generally comprise various other components,such as an integrator IN and a condenser CO. In this way, the beam Bimpinging on the patterning device MA has a desired uniformity andintensity distribution in its cross-section.

It should be noted with regard to FIG. 10 that the source SO may bewithin the housing of the lithographic projection apparatus (as is oftenthe case when the source SO is a mercury lamp, for example), but that itmay also be remote from the lithographic projection apparatus, theradiation beam that it produces being led into the apparatus (e.g., withthe aid of suitable directing mirrors BD); this latter scenario is oftenthe case when the source SO is an excimer laser (e.g., based on KrF, ArFor F₂ lasing).

The beam B subsequently intercepts the patterning device MA, which isheld on a patterning device table MT. Having traversed the patterningdevice MA, the beam B passes through the projection system PS, whichfocuses the beam B onto a target portion C of the substrate W. With theaid of the second positioner PW (and interferometer IF), the substratetable WT can be moved accurately, e.g. so as to position differenttarget portions C in the path of the beam B. Similarly, the firstpositioner PM can be used to accurately position the patterning deviceMA with respect to the path of the beam B, e.g., after mechanicalretrieval of the patterning device MA from a patterning device library,or during a scan. In general, movement of the object tables MT, WT willbe realized with the aid of a long-stroke module (coarse positioning)and a short-stroke module (fine positioning), which are not explicitlydepicted in FIG. 10.

Patterning device (e.g., mask) MA and substrate W may be aligned usingmask alignment marks M1, M2 and substrate alignment marks P1, P2.Although the substrate alignment marks as illustrated occupy dedicatedtarget portions, they may be located in spaces between target portions(these are known as scribe-lane alignment marks). Similarly, insituations in which more than one die is provided on the patterningdevice (e.g., mask) MA, the patterning device alignment marks may belocated between the dies. Small alignment markers may also be includedwithin dies, in amongst the device features, in which case it isdesirable that the markers be as small as possible and not require anydifferent imaging or process conditions than adjacent features.

FIG. 11 schematically depicts another exemplary lithographic projectionapparatus 1000. The lithographic projection apparatus 1000 includes:

-   -   a source collector module SO    -   an illumination system (illuminator) IL configured to condition        a radiation beam B (e.g. EUV radiation).    -   a support structure (e.g. a mask table) MT constructed to        support a patterning device (e.g. a mask or a reticle) MA and        connected to a first positioner PM configured to accurately        position the patterning device;    -   a substrate table (e.g. a wafer table) WT constructed to hold a        substrate (e.g. a resist coated wafer) W and connected to a        second positioner PW configured to accurately position the        substrate; and    -   a projection system (e.g. a reflective projection system) PS        configured to project a pattern imparted to the radiation beam B        by patterning device MA onto a target portion C (e.g. comprising        one or more dies) of the substrate W.

As here depicted, the apparatus 1000 is of a reflective type (e.g.employing a reflective mask). It is to be noted that because mostmaterials are absorptive within the EUV wavelength range, the patterningdevice may have a multilayer reflector comprising, for example, amulti-stack of molybdenum and silicon. In one example, the multi-stackreflector has a 40 layer pairs of molybdenum and silicon. Even smallerwavelengths may be produced with X-ray lithography. Since most materialis absorptive at EUV and x-ray wavelengths, a thin piece of patternedabsorbing material on the patterning device topography (e.g., a TaNabsorber on top of the multi-layer reflector) defines where featureswould print (positive resist) or not print (negative resist).

Referring to FIG. 11, the illuminator IL receives an extreme ultraviolet (EUV) radiation beam from the source collector module SO. Methodsto produce EUV radiation include, but are not necessarily limited to,converting a material into a plasma state that has at least one element,e.g., xenon, lithium or tin, with one or more emission lines in the EUVrange. In one such method, often termed laser produced plasma (“LPP”)the plasma can be produced by irradiating a fuel, such as a droplet,stream or cluster of material having the line-emitting element, with alaser beam. The source collector module SO may be part of an EUVradiation system including a laser, not shown in FIG. 11, to provide thelaser beam to excite the fuel. The resulting plasma emits outputradiation, e.g., EUV radiation, which is collected using a radiationcollector, disposed in the source collector module. The laser and thesource collector module may be separate entities, for example when a CO₂laser is used to provide the laser beam for fuel excitation.

In such cases, the laser is not considered to form part of thelithographic apparatus and the radiation beam is passed from the laserto the source collector module with the aid of a beam delivery systemcomprising, for example, suitable directing mirrors and/or a beamexpander. In other cases the source may be an integral part of thesource collector module, for example when the source is a dischargeproduced plasma EUV generator, often termed as a DPP source.

The illuminator IL may comprise an adjuster configured to adjust theangular intensity distribution of the radiation beam. Generally, atleast the outer and/or inner radial extent (commonly referred to asσ-outer and σ-inner, respectively) of the intensity distribution in apupil plane of the illuminator can be adjusted. In addition, theilluminator IL may comprise various other components, such as facettedfield and pupil mirror devices. The illuminator may be used to conditionthe radiation beam, to have a desired uniformity and intensitydistribution in its cross section.

The radiation beam B is incident on the patterning device (e.g., mask)MA, which is held on the support structure (e.g., mask table) MT, and ispatterned by the patterning device. After being reflected from thepatterning device (e.g. mask) MA, the radiation beam B passes throughthe projection system PS, which focuses the beam onto a target portion Cof the substrate W. With the aid of the second positioner PW andposition sensor PS2 (e.g. an interferometric device, linear encoder orcapacitive sensor), the substrate table WT can be moved accurately, e.g.so as to position different target portions C in the path of theradiation beam B. Similarly, the first positioner PM and anotherposition sensor PS1 can be used to accurately position the patterningdevice (e.g. mask) MA with respect to the path of the radiation beam B.Patterning device (e.g. mask) MA and substrate W may be aligned usingpatterning device alignment marks M1, M2 and substrate alignment marksP1, P2.

The depicted apparatus could be used in at least one of the followingmodes:

1. In step mode, the support structure (e.g. mask table) MT and thesubstrate table WT are kept essentially stationary, while an entirepattern imparted to the radiation beam is projected onto a targetportion C at one time (i.e. a single static exposure). The substratetable WT is then shifted in the X and/or Y direction so that a differenttarget portion C can be exposed.

2. In scan mode, the support structure (e.g. mask table) MT and thesubstrate table WT are scanned synchronously in a given direction (theso-called “scan direction”) while a pattern imparted to the radiationbeam is projected onto a target portion C (i.e. a single dynamicexposure). The velocity and direction of the substrate table WT relativeto the support structure (e.g. mask table) MT may be determined by the(de-)magnification and image reversal characteristics of the projectionsystem PS.

3. In another mode, the support structure (e.g. mask table) MT is keptessentially stationary holding a programmable patterning device, and thesubstrate table WT is moved or scanned while a pattern imparted to theradiation beam is projected onto a target portion C. In this mode,generally a pulsed radiation source is employed and the programmablepatterning device is updated as required after each movement of thesubstrate table WT or in between successive radiation pulses during ascan. This mode of operation can be readily applied to masklesslithography that utilizes programmable patterning device, such as aprogrammable mirror array of a type as referred to above.

Further, the lithographic apparatus may be of a type having two or moretables (e.g., two or more substrate table, two or more patterning devicetables, and/or a substrate table and a table without a substrate). Insuch “multiple stage” devices the additional tables may be used inparallel, or preparatory steps may be carried out on one or more tableswhile one or more other tables are being used for exposures. Twin stagelithographic apparatuses are described, for example, in U.S. Pat. No.5,969,441, incorporated herein by reference in its entirety.

FIG. 12 shows the apparatus 1000 in more detail, including the sourcecollector module SO, the illumination system IL, and the projectionsystem PS. The source collector module SO is constructed and arrangedsuch that a vacuum environment can be maintained in an enclosingstructure 220 of the source collector module SO. An EUV radiationemitting plasma 210 may be formed by a discharge produced plasma source.EUV radiation may be produced by a gas or vapor, for example Xe gas, Livapor or Sn vapor in which the very hot plasma 210 is created to emitradiation in the EUV range of the electromagnetic spectrum. The very hotplasma 210 is created by, for example, an electrical discharge causingan at least partially ionized plasma. Partial pressures of, for example,10 Pa of Xe, Li, Sn vapor or any other suitable gas or vapor may berequired for efficient generation of the radiation. In an embodiment, aplasma of excited tin (Sn) is provided to produce EUV radiation.

The radiation emitted by the hot plasma 210 is passed from a sourcechamber 211 into a collector chamber 212 via an optional gas barrier orcontaminant trap 230 (in some cases also referred to as contaminantbarrier or foil trap) which is positioned in or behind an opening insource chamber 211. The contaminant trap 230 may include a channelstructure. Contamination trap 230 may also include a gas barrier or acombination of a gas barrier and a channel structure. The contaminanttrap or contaminant barrier 230 further indicated herein at leastincludes a channel structure, as known in the art.

The collector chamber 211 may include a radiation collector CO which maybe a so-called grazing incidence collector. Radiation collector CO hasan upstream radiation collector side 251 and a downstream radiationcollector side 252. Radiation that traverses collector CO can bereflected off a grating spectral filter 240 to be focused in a virtualsource point IF along the optical axis indicated by the dot-dashed line‘0’. The virtual source point IF is commonly referred to as theintermediate focus, and the source collector module is arranged suchthat the intermediate focus IF is located at or near an opening 221 inthe enclosing structure 220. The virtual source point IF is an image ofthe radiation emitting plasma 210.

Subsequently the radiation traverses the illumination system IL, whichmay include a facetted field mirror device 22 and a facetted pupilmirror device 24 arranged to provide a desired angular distribution ofthe radiation beam 21, at the patterning device MA, as well as a desireduniformity of radiation intensity at the patterning device MA. Uponreflection of the beam of radiation 21 at the patterning device MA, heldby the support structure MT, a patterned beam 26 is formed and thepatterned beam 26 is imaged by the projection system PS via reflectiveelements 28, 30 onto a substrate W held by the substrate table WT.

More elements than shown may generally be present in illumination opticsunit IL and projection system PS. The grating spectral filter 240 mayoptionally be present, depending upon the type of lithographicapparatus. Further, there may be more mirrors present than those shownin the figures, for example there may be 1-6 additional reflectiveelements present in the projection system PS than shown in FIG. 12.

Collector optic CO, as illustrated in FIG. 12, is depicted as a nestedcollector with grazing incidence reflectors 253, 254 and 255, just as anexample of a collector (or collector mirror). The grazing incidencereflectors 253, 254 and 255 are disposed axially symmetric around theoptical axis O and a collector optic CO of this type is desirably usedin combination with a discharge produced plasma source, often called aDPP source. Alternatively, the source collector module SO may be part ofan LPP radiation system.

The term “projection system” used herein should be broadly interpretedas encompassing any type of projection system, including refractive,reflective, catadioptric, magnetic, electromagnetic and electrostaticoptical systems, or any combination thereof, as appropriate for theexposure radiation being used, or for other factors such as the use ofan immersion liquid or the use of a vacuum.

The lithographic apparatus may also be of a type wherein at least aportion of the substrate may be covered by a liquid having a relativelyhigh refractive index, e.g., water, so as to fill a space between theprojection system and the substrate. An immersion liquid may also beapplied to other spaces in the lithographic apparatus, for example,between the mask and the projection system. Immersion techniques arewell known in the art for increasing the numerical aperture ofprojection systems. The term “immersion” as used herein does not meanthat a structure, such as a substrate, must be submerged in liquid, butrather only means that liquid is located between the projection systemand the substrate during exposure.

The concepts disclosed herein may be used to simulate or mathematicallymodel any device manufacturing process involving a lithographicapparatus, and may be especially useful with emerging imagingtechnologies capable of producing wavelengths of an increasingly smallersize. Emerging technologies already in use include deep ultraviolet(DUV) lithography that is capable of producing a 193 nm wavelength withthe use of an ArF laser, and even a 157 nm wavelength with the use of afluorine laser. Moreover, EUV lithography is capable of producingwavelengths within a range of 5-20 nm.

While the concepts disclosed herein may be used for device manufacturingon a substrate such as a silicon wafer, it shall be understood that thedisclosed concepts may be used with any type of lithographic imagingsystems, e.g., those used for imaging on substrates other than siliconwafers.

The patterning device referred to above comprises or can form a designlayout. The design layout can be generated utilizing a CAD(computer-aided design) program. This process is often referred to asEDA (electronic design automation). Most CAD programs follow a set ofpredetermined design rules in order to create functional designlayouts/patterning devices. These rules are set by processing and designlimitations. For example, design rules define the space tolerancebetween circuit devices (such as gates, capacitors, etc.) orinterconnect lines, so as to ensure that the circuit devices or lines donot interact with one another in an undesirable way. The design rulelimitations are typically referred to as “critical dimensions” (CD). Acritical dimension of a circuit can be defined as the smallest width ofa line or hole or the smallest space between two lines or two holes.Thus, the CD determines the overall size and density of the designedcircuit. Of course, one of the goals in integrated circuit fabricationis to faithfully reproduce the original circuit design on the substrate(via the patterning device).

The term “mask” or “patterning device” as employed in this text may bebroadly interpreted as referring to a generic patterning device that canbe used to endow an incoming radiation beam with a patternedcross-section, corresponding to a pattern that is to be created in atarget portion of the substrate; the term “light valve” can also be usedin this context. Besides the classic mask (transmissive or reflective;binary, phase-shifting, hybrid, etc.), examples of other such patterningdevices include:

-   -   a programmable mirror array. An example of such a device is a        matrix-addressable surface having a viscoelastic control layer        and a reflective surface. The basic principle behind such an        apparatus is that (for example) addressed areas of the        reflective surface reflect incident radiation as diffracted        radiation, whereas unaddressed areas reflect incident radiation        as undiffracted radiation. Using an appropriate filter, the said        undiffracted radiation can be filtered out of the reflected        beam, leaving only the diffracted radiation behind; in this        manner, the beam becomes patterned according to the addressing        pattern of the matrix-addressable surface. The required matrix        addressing can be performed using suitable electronic means.        More information on such mirror arrays can be gleaned, for        example, from U.S. Pat. Nos. 5,296,891 and 5,523,193, which are        incorporated herein by reference.    -   a programmable LCD array. An example of such a construction is        given in U.S. Pat. No. 5,229,872, which is incorporated herein        by reference.

As noted, microlithography is a significant step in the manufacturing ofdevices such as ICs, where patterns formed on substrates definefunctional elements of the ICs, such as microprocessors, memory chipsetc. Similar lithographic techniques are also used in the formation offlat panel displays, micro-electro mechanical systems (MEMS) and otherdevices.

The process in which features with dimensions smaller than the classicalresolution limit of a lithographic projection apparatus are printed, iscommonly known as low-k₁ lithography, according to the resolutionformula CD=k₁×λ/NA, where λ is the wavelength of radiation employed(currently in most cases 248 nm or 193 nm), NA is the numerical apertureof projection optics in the lithographic projection apparatus, CD is the“critical dimension”—generally the smallest feature size printed—and k₁is an empirical resolution factor. In general, the smaller k₁ the moredifficult it becomes to reproduce a pattern on the substrate thatresembles the shape and dimensions planned by a circuit designer inorder to achieve particular electrical functionality and performance. Toovercome these difficulties, sophisticated fine-tuning steps are appliedto the lithographic projection apparatus and/or design layout. Theseinclude, for example, but not limited to, optimization of NA and opticalcoherence settings, customized illumination schemes, use of phaseshifting patterning devices, optical proximity correction (OPC,sometimes also referred to as “optical and process correction”) in thedesign layout, or other methods generally defined as “resolutionenhancement techniques” (RET).

As an example, OPC addresses the fact that the final size and placementof an image of the design layout projected on the substrate will not beidentical to, or simply depend only on the size and placement of thedesign layout on the patterning device. A person skilled in the art willrecognize that, especially in the context of lithographysimulation/optimization, the term “mask”/“patterning device” and “designlayout” can be used interchangeably, as in lithographysimulation/optimization, a physical patterning device is not necessarilyused but a design layout can be used to represent a physical patterningdevice. For the small feature sizes and high feature densities presenton some design layout, the position of a particular edge of a givenfeature will be influenced to a certain extent by the presence orabsence of other adjacent features. These proximity effects arise fromminute amounts of radiation coupled from one feature to another and/ornon-geometrical optical effects such as diffraction and interference.Similarly, proximity effects may arise from diffusion and other chemicaleffects during post-exposure bake (PEB), resist development, and etchingthat generally follow lithography.

To help ensure that the projected image of the design layout is inaccordance with requirements of a given target circuit design, proximityeffects may be predicted and compensated for, using sophisticatednumerical models, corrections or pre-distortions of the design layout.The article “Full-Chip Lithography Simulation and Design Analysis—HowOPC Is Changing IC Design”, C. Spence, Proc. SPIE, Vol. 5751, pp 1-14(2005) provides an overview of current “model-based” optical proximitycorrection processes. In a typical high-end design almost every featureof the design layout has some modification in order to achieve highfidelity of the projected image to the target design. Thesemodifications may include shifting or biasing of edge positions or linewidths as well as application of “assist” features that are intended toassist projection of other features.

Applying OPC is generally not an “exact science”, but an empirical,iterative process that does not always compensate for all possibleproximity effect. Therefore, the effect of OPC, e.g., design layoutsafter application of OPC and any other RET, should be verified by designinspection, i.e. intensive full-chip simulation using calibratednumerical process models, in order to minimize the possibility of designflaws being built into the patterning device pattern.

Both OPC and full-chip RET verification may be based on numericalmodeling systems and methods as described, for example in, U.S. PatentApplication Publication No. US 2005-0076322 and an article titled“Optimized Hardware and Software For Fast, Full Chip Simulation”, by Y.Cao et al., Proc. SPIE, Vol. 5754, 405 (2005).

One RET is related to adjustment of the global bias of the designlayout. The global bias is the difference between the patterns in thedesign layout and the patterns intended to print on the substrate. Forexample, a circular pattern of 25 nm diameter may be printed on thesubstrate by a 50 nm diameter pattern in the design layout or by a 20 nmdiameter pattern in the design layout but with high dose.

In addition to optimization to design layouts or patterning devices(e.g., OPC), the illumination source can also be optimized, eitherjointly with patterning device optimization or separately, in an effortto improve the overall lithography fidelity. The terms “illuminationsource” and “source” are used interchangeably in this document. As isknown, off-axis illumination, such as annular, quadrupole, and dipole,is a proven way to resolve fine structures (i.e., target features)contained in the patterning device. However, when compared to atraditional illumination source, an off-axis illumination source usuallyprovides less radiation intensity for the aerial image (AI). Thus, itbecomes desirable to attempt to optimize the illumination source toachieve the optimal balance between finer resolution and reducedradiation intensity.

Numerous illumination source optimization approaches can be found, forexample, in an article by Rosenbluth et al., titled “Optimum Mask andSource Patterns to Print A Given Shape”, Journal of Microlithography,Microfabrication, Microsystems 1(1), pp. 13-20, (2002). The source ispartitioned into several regions, each of which corresponds to a certainregion of the pupil spectrum. Then, the source distribution is assumedto be uniform in each source region and the brightness of each region isoptimized for the process window. In another example set forth in anarticle by Granik, titled “Source Optimization for Image Fidelity andThroughput”, Journal of Microlithography, Microfabrication, Microsystems3(4), pp. 509-522, (2004), several existing source optimizationapproaches are overviewed and a method based on illuminator pixels isproposed that converts the source optimization problem into a series ofnon-negative least square optimizations.

For low k₁ photolithography, optimization of both the source andpatterning device is useful to ensure a viable process window forprojection of critical circuit patterns. Some algorithms discretizeillumination into independent source points and the patterning devicepattern into diffraction orders in the spatial frequency domain, andseparately formulate a cost function (which is defined as a function ofselected design variables) based on process window metrics such asexposure latitude which could be predicted by optical imaging modelsfrom source point intensities and patterning device diffraction orders.The term “design variables” as used herein comprises a set of parametersof an apparatus or a device manufacturing process, for example,parameters a user of the lithographic apparatus can adjust, or imagecharacteristics a user can adjust by adjusting those parameters. Itshould be appreciated that any characteristics of a device manufacturingprocess, including those of the source, the patterning device, theprojection optics, and/or resist characteristics can be among the designvariables in the optimization. The cost function is often a non-linearfunction of the design variables. Then standard optimization techniquesare used to minimize the cost function.

A source and patterning device (design layout) optimization method andsystem that allows for simultaneous optimization of the source andpatterning device using a cost function without constraints and within apracticable amount of time is described in a commonly assigned PCTPatent Application Publication No. WO2010/059954, which is herebyincorporated by reference in its entirety.

Another source and mask optimization method and system that involvesoptimizing the source by adjusting pixels of the source is described inU.S. Patent Application Publication No. 2010/0315614, which is herebyincorporated by reference in its entirety.

The term “projection optics” as used herein should be broadlyinterpreted as encompassing various types of optical systems, includingrefractive optics, reflective optics, apertures and catadioptric optics,for example. The term “projection optics” may also include componentsoperating according to any of these design types for directing, shapingor controlling the projection beam of radiation, collectively orsingularly. The term “projection optics” may include any opticalcomponent in the lithographic projection apparatus, no matter where theoptical component is located on an optical path of the lithographicprojection apparatus. Projection optics may include optical componentsfor shaping, adjusting and/or projecting radiation from the sourcebefore the radiation passes the patterning device, and/or opticalcomponents for shaping, adjusting and/or projecting the radiation afterthe radiation passes the patterning device. The projection opticsgenerally exclude the source and the patterning device.

Although specific reference may have been made above to the use ofembodiments in the context of optical lithography, it will beappreciated that an embodiment of the invention may be used in otherapplications, for example imprint lithography, and where the contextallows, is not limited to optical lithography. In imprint lithography, atopography in a patterning device defines the pattern created on asubstrate. The topography of the patterning device may be pressed into alayer of resist supplied to the substrate whereupon the resist is curedby applying electromagnetic radiation, heat, pressure or a combinationthereof. The patterning device is moved out of the resist leaving apattern in it after the resist is cured. Thus, a lithographic apparatususing the imprint technology typically include a template holder to holdan imprint template, a substrate table to hold a substrate and one ormore actuators to cause relative movement between the substrate and theimprint template so that the pattern of the imprint template can beimprinted onto a layer of the substrate.

Embodiments of the invention are further be described using thefollowing clauses:

1. A computer-implemented defect prediction method for a devicemanufacturing process involving production substrates processed by alithographic apparatus, the method comprising: training a classificationmodel using a training set comprising measured or determined values of aprocess parameter associated with the production substrates processed bythe device manufacturing process and an indication regarding existenceof defects associated with the production substrates processed in thedevice manufacturing process under the values of the process parameter;and producing an output from the classification model that indicates aprediction of a defect for a substrate.2. The method of clause 1, comprising training the classification modelusing a further training set comprising further measured or determinedvalues of a process parameter associated with production substratesprocessed by the device manufacturing process and an indicationregarding existence of defects associated with the production substratesprocessed in the device manufacturing process under the further valuesof the process parameter.3. The method of clause 2, wherein at least some of the further valuesare generated after training the classification model using the measuredor determined values.4. The method of clause 2 or clause 3, wherein the further training setcomprises at least a portion of the measured or determined values inaddition to the further values.5. The method of any of clauses 1 to 4, further comprising repeatedlyperforming the training based on further measured or determined valuesof the process parameter associated with further production substratesprocessed by the device manufacturing process.6. The method of any of clauses 1 to 5, further comprising calculating aprobability of the defect for the substrate using the classificationmodel.7. The method of clause 6, further comprising adjusting a parameter ofthe device manufacturing process, a parameter of a layout to bepatterned onto a substrate, or both, using the probability.8. The method of any of clauses 1 to 7, wherein the indication regardingexistence of the defect comprises a determination by an opticalmeasuring tool or operator input or determined from yield data orelectronic testing data.9. The method of any of clauses 1 to 8, wherein the indication regardingexistence of the defect comprises a determination by an empirical orcomputational model.10. The method of any of clauses 1 to 9, wherein the indicationregarding existence of the defect comprises determination by a user ofthe lithographic apparatus.11. The method of any of clauses 1 to 10, wherein the indicationregarding existence of the defect comprises a determination afterpatterning a layout on each die of a substrate or each substrate.12. The method of any of clauses 1 to 11, wherein the classificationmodel involves logistic regression, kernel logistic regression, supportvector machine or import vector machine.13. The method of any of clauses 1 to 12, wherein a number of categoriesof the classification model is two.14. The method of clause 13, wherein the categories comprise existenceof defects and non-existence of defects.15. The method of any of clauses 1 to 14, wherein the defects are one ormore selected from a group consisting of necking, line pull back, linethinning, CD, overlapping and bridging.16. The method of any of clauses 1 to 15, wherein the parameter of thedevice manufacturing process is one or more selected from a groupconsisting of a characteristic of a radiation source of the lithographicapparatus, a characteristic of projection optics of the lithographicapparatus, dose, focus, a characteristic of a resist, a characteristicof development of the resist, a characteristic of post-exposure bakingof the resist, and a characteristic of etching of a substrate.17. The method of any of clauses 1 to 16, further comprising trainingthe classification model using values of a process parameter simulatedusing a parameter of a layout to be patterned on a substrate and anindication regarding existence of defects associated with the simulatedvalues of the process parameter.18. The method of any of clauses 1 to 17, further comprising trainingthe classification model using values of the process parameter measuredby a metrology tool.19. The method of any of clauses 1 to 18, further comprising determiningthe indication regarding existence of the defects associated with thevalues of the process parameter.20. The method of any of clauses 1 to 19, further comprising measuringor determining the values of the process parameter, the values being oneor more selected from: measured values from a metrology tool, yielddata, or values from a lithographic apparatus.21. The method of any of clauses 1 to 20, wherein the devicemanufacturing process is an etch process.22. The method of any of clauses 1 to 20, wherein the devicemanufacturing comprises a lithographic patterning process.23. A method of training a classification model, the method comprising:

predicting a defect in or on a substrate using the classification model,the classification model having, as an independent variable, a processparameter of a device manufacturing process for lithographically exposedsubstrates and/or a layout parameter of a pattern to be provided on asubstrate using a lithographic apparatus;

receiving information regarding existence of a defect for a measured ordetermined value of the process parameter and/or layout parameter; and

training the classification model based on the predicted defect and theinformation regarding existence of the defect for the measured ordetermined value of the process parameter and/or layout parameter.

24. The method of clause 23, wherein the information regarding existenceof the defect comprises a plurality of values of the process parameterof the device manufacturing process measured by an optical measuringtool.25. The method of clause 23 or clause 24, further comprising repeatingthe predicting, receiving and training based on data measured during thedevice manufacturing process from a plurality of substrates processed bythe device manufacturing process.26. The method of any of clauses 23 to 25, further comprising adjustinga parameter of the device manufacturing process, a parameter of a layoutto be patterned onto a substrate, or both, using an output of theclassification model.27. The method of any of clauses 23 to 26, wherein the classificationmodel involves logistic regression, kernel logistic regression, supportvector machine or import vector machine.28. A computer-implemented method of producing a classification model tofacilitate defect prediction in a device manufacturing process involvingproduction substrates processed by a lithographic apparatus, the methodcomprising training the classification model using a training setcomprising measured or determined values of a process parameter of aplurality of substrates processed by the device manufacturing processand an indication regarding existence of defects associated with thevalues of the process parameter.29. The method of clause 28, further comprising predicting a defect in asubstrate using the classification model.30. The method of clause 29, further comprising providing an estimate ofthe probability of the defect.31. A computer program product comprising a computer readable mediumhaving instructions recorded thereon, the instructions when executed bya computer implementing the method of any of the above clauses.

The descriptions above are intended to be illustrative, not limiting.Thus, it will be apparent to one skilled in the art that modificationsmay be made as described without departing from the scope of the claimsset out below.

What is claimed is:
 1. A computer-implemented defect prediction methodfor a device manufacturing process involving production substratesprocessed by a lithographic apparatus, the method comprising: training aclassification model using a training set comprising measured ordetermined values of a process parameter associated with the productionsubstrates processed by the device manufacturing process and anindication regarding existence of defects associated with the productionsubstrates processed in the device manufacturing process under thevalues of the process parameter; and producing an output from theclassification model that indicates a prediction of a defect for asubstrate.
 2. The method of claim 1, comprising training theclassification model using a further training set comprising furthermeasured or determined values of a process parameter associated withproduction substrates processed by the device manufacturing process andan indication regarding existence of defects associated with theproduction substrates processed in the device manufacturing processunder the further values of the process parameter.
 3. The method ofclaim 2, wherein at least some of the further values are generated aftertraining the classification model using the measured or determinedvalues.
 4. The method of claim 2, wherein the further training setcomprises at least a portion of the measured or determined values inaddition to the further values.
 5. The method of claim 1, furthercomprising repeatedly performing the training based on further measuredor determined values of the process parameter associated with furtherproduction substrates processed by the device manufacturing process. 6.The method of claim 1, further comprising calculating a probability ofthe defect for the substrate using the classification model.
 7. Themethod of claim 6, further comprising adjusting a parameter of thedevice manufacturing process, a parameter of a layout to be patternedonto a substrate, or both, using the probability.
 8. The method of claim1, wherein the indication regarding existence of the defect comprises adetermination by an optical measuring tool or operator input ordetermined from yield data or electronic testing data.
 9. The method ofclaim 1, wherein the indication regarding existence of the defectcomprises a determination by an empirical or computational model, ordetermination by a user of the lithographic apparatus, or adetermination after patterning a layout on each die of a substrate oreach substrate.
 10. The method of claim 1, wherein the classificationmodel involves logistic regression, kernel logistic regression, supportvector machine or import vector machine.
 11. The method of claim 1,wherein a number of categories of the classification model is two. 12.The method of claim 11, wherein the categories comprise existence ofdefects and non-existence of defects.
 13. The method of claim 1, whereinthe defects are one or more selected from a group consisting of necking,line pull back, line thinning, CD, overlapping and bridging.
 14. Amethod of training a classification model, the method comprising:predicting a defect in or on a substrate using the classification model,the classification model having, as an independent variable, a processparameter of a device manufacturing process for lithographically exposedsubstrates and/or a layout parameter of a pattern to be provided on asubstrate using a lithographic apparatus; receiving informationregarding existence of a defect for a measured or determined value ofthe process parameter and/or layout parameter; and training theclassification model based on the predicted defect and the informationregarding existence of the defect for the measured or determined valueof the process parameter and/or layout parameter.
 15. Acomputer-implemented method of producing a classification model tofacilitate defect prediction in a device manufacturing process involvingproduction substrates processed by a lithographic apparatus, the methodcomprising training the classification model using a training setcomprising measured or determined values of a process parameter of aplurality of substrates processed by the device manufacturing processand an indication regarding existence of defects associated with thevalues of the process parameter.